Take-home Ex03

Author

Li Ziyi

Published

February 7, 2023

Modified

February 13, 2023

1 Overview

For this assignment, the salient patterns of the resale prices of public housing properties by residential towns and estates in Singapore will be explored using dataset taken from Data.gov.sg

1.1 Loading libraries

For this exercise,

  • tidyverse is the main package to be used for data processing

  • DT is the package to be used for interactive data preview

  • ggstatsplot is the packge to be used for statistical analysis and visualisation

  • ggiraph, gganimate and gifski to enable interactive data visualisation

  • Some other packages for make-up mainly

pacman::p_load(tidyverse,
               DT,
               ggstatsplot,
               ggiraph,
               gganimate,
               gifski,
               ggthemes)

1.2 Data reading

flat_full <- read_csv("Data/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
DT::datatable(flat_full,
              class ="cell-border stripe")

Through the preview of the dataset, certain data processing would be beneficial for easier analysis later on.

1.3 Data processing

flat_temp <- flat_full %>%
  separate(month,
           into = c("year_", "month_"),
           sep = "-",
           convert = TRUE) %>% 
  mutate(price_psqm = resale_price / floor_area_sqm)

flat_temp["rem_lease_yrs"] = (99 - (2022 - flat_temp$lease_commence_date))
DT::datatable(flat_temp,
                class ="cell-border stripe")

2 Selection and design consideration

p1 <- ggplot(flat_temp,
           aes(x = factor(year_),
               fill = flat_type)) +
  geom_bar_interactive(aes(tooltip = flat_type)) +
  labs(title = "Number of resale flat transactions from 2017 to 2023",
       x = "Year",
       y = "Number of transactions",
       fill = "Flat type") + 
  theme_economist() +
  theme(axis.title.y = element_text(vjust = 2.5,
                                    size = 12),
        axis.title.x = element_text(vjust = -2.5,
                                    size = 12))

girafe(
  ggobj = p1,
  width_svg = 12,
  height_svg = 6
)

From the visualisation above, the top three resale transactions came from 3, 4 and 5-room flats each year. Since 2023 has just started, the main focus of this study would zoom into resales transactions from 3, 4 and 5-room flats in the year of 2022.

flat_2022_3types <- flat_temp %>% 
  filter(year_ == "2022",
         flat_type %in% c("3 ROOM", "4 ROOM", "5 ROOM")) %>% 
  mutate(mth_abb = month.abb[month_])

The resale transaction price and its trend back in 2022 would be studied. To have a more meaningful comparison, median values would be used for most of studies below, instead of average values. This is to avoid the sensitivity impact driven by the extreme values on average values. Furthermore, price per square meters (psm) would be calculated to normalise for some comparison across different flat types. From there, some of factors like town location, storey and their impact on the unit price would be investigated.

3 Sketches of design

PICS TO BE INSERTED HERE

4 Final visualisation

4.1 Three-sample mean test

ggbetweenstats(
  data = flat_2022_3types,
  x = flat_type,
  y = resale_price,
  type = "np",
  plot.type = "boxviolin",
  title = "Non-parametric mean test for 3, 4 & 5-room HDB",
  xlab = "Flat type",
  ylab ="Resale price") +
  theme_economist()

Using a non-parametric test, from the result p = 0 < 0.05, it can be concluded that the resale price distribution does not follow a normal distribution.

4.2 Median resale price trend in 2022

flat_by_mth <- flat_2022_3types %>% 
  group_by(flat_type, month_) %>% 
  summarise(total_sales = n(),
            median_sales_price = median(resale_price),
            median_house_size_sqm = median(floor_area_sqm),
            median_remaining_lease_yrs = median(rem_lease_yrs),
            max_sales_price = max(resale_price),
            min_sales_price = min(resale_price)) %>% 
  arrange(desc(total_sales))
flat_by_mth$tooltip <- c(paste0(
  "Flat type: ", flat_by_mth$flat_type,
  "\n Median resales price: ", flat_by_mth$median_sales_price
))

p2 <- ggplot(data = flat_by_mth,
             aes(x = month_, 
                 y = median_sales_price, 
                 colour = flat_type)) +
        geom_point_interactive(aes(tooltip = flat_by_mth$tooltip)) +
        geom_smooth() +
        scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) +
        scale_x_continuous(breaks = seq_along(month.abb),
                           labels = month.abb) +
        theme_economist() +
        theme(axis.title.y = element_text(vjust = 2.5),
              legend.position = "right") +
  labs(x = NULL,
       y = "Median resale price") +
  ggtitle("Median resale price trend in 2022 for 3, 4 and 5 ROOM HDB")


girafe(
  ggobj = p2,
  width_svg = 12
)

Based on the point plot, it is observed that the median resale transaction price for all three flat types has been on an increasing trend throughout the whole year of 2022. This indeed aligns with our experience for the year. during which most of room rental or purchase contracts have seen significant surges.

Besides, the price increment from adding one more room is roughly $100,000.

4.3 Resale price per square meter by town for each flat type

ggplot(flat_2022_3types,
       aes(x = town, 
           y = price_psqm,
           color = town)) +
  geom_boxplot() +
  # geom_label_repel(aes(label = block),
  #                  max.overlaps = 5000) +
  facet_wrap(~ flat_type, nrow = 3) +
  labs(title = "Flat Resale Price (psm) by Town for 3, 4, 5-room HDB flats",
       x = NULL,
       y = "Resale Price ($ psm)") +
  guides(color = FALSE) +
  theme_economist() +
  theme(plot.title = element_text(size=16, hjust=0.5),
        axis.text.x = element_text(vjust = 0.5,
                                   angle = 60,
                                   size = 8),
        axis.title.y = element_text(size=15,
                                    vjust = 2.5),
        panel.spacing = unit(3, "line"))

Using the chart, the resale price psm for one town remain largely the same across three flat types.

For 3-room type, the median resale price psm are rather close for all towns. The maximum resale price psm for 3-room type was seen in Bukit Merah area and the minimum resale unit price was most likely seen in Toa Payoh.

For 4-room type, the median resale price psm in Central area and in Queenstown were rather higher than most of the rest areas.

For 5-room type, the median resale price psm in Central area were extremely high, while the rest of areas’ were relatively close.

What’s more, the price psm for each flat type varies widely even within the same area, as can be seen from the difference between max and min values.

4.4 Median resales price per square meter by town for each storey range

flat_heatmap <- flat_2022_3types %>% 
  group_by(town, storey_range) %>% 
  summarise(total_sales = n(),
            median_sales_price = median(resale_price),
            median_house_size_sqm = median(floor_area_sqm),
            median_remaining_lease_yrs = median(rem_lease_yrs)) %>% 
  arrange(desc(total_sales))
heatmap <- ggplot(data = flat_heatmap, 
                  mapping = aes(x = town, 
                                y = storey_range,
                                fill = median_sales_price)) +
            geom_tile() +
  labs(title = "Median resales price per square meter by town and storey for 3, 4 & 5-Room HDB flat", 
       x = NULL, 
       y = "Storey") +
  scale_fill_gradient(name = "Price per square meter",
                      low = "#F2F2F2",
                      high = "#00532F")+
  theme_economist() +
  theme(axis.text.x = element_text(angle = 60,
                                   vjust = 0.5,
                                   size = 8),
        axis.title.y = element_text(vjust = 2.5),
        legend.position = "right",
        legend.text = element_text(size = 6),
        legend.title = element_text(size = 8))

heatmap

Using the heatmap, it is generally true that higher storey units tends to have a better price psm, as illustrated from the darker green on higher storeys.

However, for some unpopular towns like Choa Chu Kang, Woodlands, Jurong West, Pasir Ris, Punggol, Sembawang and Yishun, the price psm difference do not seem significant between higher and lower storey units.

Additionally, it can be observed that resale flats in Bukit Timah and Yishun are much lower rised compared to the rest of towns. Especially, the highest unit observed in Bukit Timah was only 15-storey. On the other hand, resales flats in Central Area are the highest with storey higher than 50 available. Besides, the price psm for 30-stroey in Central Area were mostly highest comparing to other towns.

4.5 Resales price per square meter against remaining lease years throughout 2022

p3 <-
ggplot(flat_2022_3types,
       aes(x = rem_lease_yrs,
           y = price_psqm,
           colour = flat_type)) +
  geom_point() +
  labs(title = "Resale price per square meter against remaining lease years",
       x = "Remaining Lease (Years)",
       y = "Resale price ($ psm)",
       fill ="Flat type",
       caption = "Month of 2022: {frame_time}") +
  theme_economist() +
  geom_smooth(method="lm",
              se = FALSE,
              color = "mediumorchid1",
              formula = y ~ x) +
  theme(plot.title = element_text(hjust = 0),
        axis.title.x = element_text(size = 12,
                                    vjust = 0.5),
        axis.title.y = element_text(size = 12,
                                    vjust = 2.5),
        legend.position = "none",
        panel.spacing = unit(2, "line")) +
  facet_grid(flat_type ~ .) +
  transition_time(flat_2022_3types$month_) +
  ease_aes('linear')

animate(p3,
        nframes = 12,
        fps = 0.5)

According to the chart, throughout 2022 from Janurary to December, it’s certain that the resale price psm increases along the number of years remaining on the lease.

What is also observed was that most of resale transactions for those with less than 50 years on the lease were 3-room flat mainly.

Furthermore, relatively more transactions are seen particularly for those with the remaining lease above than 90 years.

5 Insights and thoughts

5.1 Most selling units in 2022

Price increase observed from 3, 4 and 5 ROOM HDB median prices within 2022

To be continued…

6 Conclusion